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Abstract 


We study a risk sensitive control version of the lifetime ruin probability problem. We 
consider a sequence of investments problems in Black-Scholes market that includes a risky 
asset and a riskless asset. We present a differential game that governs the limit behavior. 
We solve it explicitly and use it in order to find an asymptotically optimal policy. 
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1 Introduction 

The problem of how an individual should invest her wealth in a risky financial market in order 
to minimize the probability of outliving her wealth, also known as the probability of lifetime 
ruin was extensively analyzed, see e.g. [20], [27], |5|, ID, |6], |7|, m, and [8]. These works 
fall naturally within the area of optimally controlling wealth to reach a goal. Research on this 
topic goes back to the seminal work of m and continued with the work of [221, [ZD, [25], US], 

m, [9], m, and [n]. 

In the standard Black-Scholes market that includes a risky asset and a riskless asset, the 
case of interest is when the investor consumes more than the potential profit that follows 
by investing the entire wealth in the riskless asset, that is c{x) > rx, in which c(-) is the 
consumption function, r is the constant riskless rate, and x is the current wealth. The other 
case is trivial, of-course, since by investing the entire wealth in the riskless asset the wealth 
cannot decrease and ruin is avoided. In case that c(x) — rx ~ 0^ then the investor who 
wishes to minimize the probability of lifetime ruin should invest almost all of her wealth in the 
riskless asset. The probability of ruin would be small, yet positive. With the understanding 
that lifetime ruin is a rare and dramatic event and that one should also avoid living close to the 
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ruin level, we study this case, by using a risk sensitive control framework. The risk sensitive 
control criteria, penalizes such events heavily, and therefore, provides a natural way to address 
these considerations. 

We study the risk sensitive control via large deviations techniques. In [23], Pham provides 
some applications and methods of large deviations in finance and insurance. Among the 
studied models, he considers ruin problems when the initial reserve is large and therefore, the 
probability of ruin is small. We, on the other hand, study a lifetime ruin problem, which is 
a different problem, and via risk sensitive control with small noise, as described below, which 
yields a different analysis. 

In order to rigorously treat the mentioned case that c{x) — rx ~ 0"*“ we consider a sequence 
of models, indexed by n € N, that differ from each other only in the consumption function in 
a way that c^{x) — rx = 0{l/n), where n is a large parameter. By using an appropriate time 
scaling we get a risk sensitive control with small noise as follows: The scaled wealth process 
under the consumption function c” satisfies 

dW^{t) = b{W^{t),7v^{t))dt + n^{t))dB{t), t > 0, 

\/n 

W”(0) = X 


for some proper b and a, where tt”' is the investment policy, and B is a standard Brownian 
motion. The goal is to choose tt”’ that minimizes 


— InE 
n 



Z(rV"(s))ds+pl{^n<^n} 


where is the time of death, r” is the time of reaching the ruin level a, p is a penalty for 
lifetime ruin, and I is a nonnegative non-increasing Lipschitz function that penalizes low wealth. 
We present a differential game that governs the limiting behavior. We solve it explicitly and 
use it in order to find an asymptotically optimal policy. 

Risk sensitive control for controlled stochastic differential equations with small noise have 
been studied for example in and [Hj. For a survey about the topic the reader 

is referred to |15j . There are several approaches towards this problem. In Fleming 
and Soner used differential equations tools and show that the sequence of the appropriate 
prelimit Hamiltonians converges to the Hamiltonian that is associated with the differential 
game. Among other requirements, it is assumed that the terminal cost is continuous and 
that the terminal time is fixed. In our case, the indicator takes the role of the terminal cost, 
which besides of being not continuous, in this case it also depends on the history of the wealth 
process. Also, we consider a random terminal time that is independent of the wealth process. 
Moreover, partial differential equations techniques does not provide asymptotically optimal 
policies, while we do. 

In [131 , Dupuis and Kushner approached a risk sensitive control problem of minimizing 
escape time probabilities by techniques taken from the theory of large deviation. Some of their 
requirements are that the drift and the diffusion coefficients, b and a respectively, are bounded 
and the latter is also non-degenerate and does not depend on the control. These requirements 
are essential for the proofs. Also, they use a fixed terminal time. In our model, besides that the 
terminal time is random, the drift and the diffusion coefficients are assumed to be Lipschitz, 
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but only the diffusion coefficient, a, is assumed to be bounded. We allow a to be zero and to 
depend on the control. In fact, under the asymptotically optimal policy that we suggest the 
diffusion coefficient can be degenerate. 

Recently, in [T] and [2] the authors considered a queuing network problem under the mod¬ 
erate deviation heavy traffic regime. By using a variant of the proof of Varadhan’s lemma and 
some properties of the differential game, an asymptotic optimality in the queueing systems 
is shown. In these papers, the controlled stochastic processes are not diffusion, but they are 
relatively close in distribution to a controlled diffusion with small noise. Therefore, the anal¬ 
ysis requires some additional tools, and mainly the Skorohod mapping. While the structure 
of the queueing network in the prelimit raises some difficulties, the approximated diffusion is 
relatively simple and consists of Brownian motion (reflected Brownian motion, in the second 
paper) with drift. Although our proof considers some measure change arguments and is in¬ 
spired by the proof of Varadhan’s lemma, in contrast to [I] and [2], we need to work with a 
controlled diffusion process. 

Regarding the random terminal time, the cost function can be referred as a discounted 
version of the risk sensitive cost. The only model from the above that considered a similar 
discounted structure is [2]. However, unlike the mentioned paper, we consider a scaled discount 
factor. The differential game associated with [2] appears in [3] and like in our case, the optimal 
solution of the game is time-homogeneous. Motivated by this property we analyze discounted 
risk sensitive control with small noise diffusions further in a future paper. 

Let us summarize the contribution of this paper: 

• We propose a risk-sensitive cost for a lifetime ruin problem, which can be expressed as a 
discounted risk sensitive cost. We present a differential game that governs the limiting 
behavior. 

• We solve the differential game explicitly, including finding an optimal policy for the 
minimizer that leads to an asymptotically optimal policy in the prelimit stochastic model. 

• Our assumptions over the diffusion process are weaker than what usually appears in the 
literature, and yet we manage to find an asymptotically optimal control. 

The organization of the paper is as follows. In Section [2] we describe the model, intro¬ 
duce the differential game, and state the main results. In Section [3] we analyze the differen¬ 
tial game, present an Hamilton-Jacobi-Bellman (HJB) equation, characterize the differential 
game’s value function as its unique solution, and we provide an explicit expression for the value 
function. Then we present an explicit optimal control for the minimizer, and a simple control 
for the maximizer that achieves the value function. In Section 0] we prove the main result by 
showing that in the limit the differential game describes the stochastic model. 

We close this section by introducing some frequently used notation. 

Notation. We denote [0,oo) by M+. For / : [0,t] — )• M let \f\t '■= supo<s<t l/(s)|. For any 
interval I denote by AC{I) and C(/) the spaces of absolutely continuous functions (resp., contin¬ 
uous functions) mapping / —)• M. Write ACq{I) and Cq{I) for the subsets of the corresponding 
function spaces, of functions that start at zero. 
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2 Model and results 

2.1 The stochastic model 

We consider a sequence of stochastic models, indexed by n S N of an investor who trades 
continuously in a Black-Scholes type financial market with no transaction costs. We allow 
borrowing and short-selling. The price of the riskless asset follows 

dV{t) = rV{t)dt, 

where r > 0 is the constant interest rate. The risky asset follows a geometric Brownian motion: 

dS{t) = S{t) [fidt + adB{t)] , 

where > r and cr > 0 are constants and {B(t))t>o is a standard Brownian motion. For reasons 
that will be clear onwards we define a sequence of consumption functions, indexed by n. For 
any given n £ N we assume that consumption function takes the form: c”(-) = r • +^e(-), for 
some function e : [a,oo) ^ M. We assume that e(-) is a Lipshcitz function and that there is 
a positive constant Mq such that e(-) < Mq. For every n G N and at any given time t > 0 
let be the amount of money that is invested in the risky asset. Then the wealth process 

satisfies 


dW^{t) = {rW^{t) - + (^ - r)K^{t)) dt + aK^{t)dB{t), t > 0, 

VF"(0) = x. 

Now, by using time scaling and by referring to W^{-) = W'^{n-) we get that 

dW"‘{t) = ^—e{W'^{t)) + {fi — r)7r'^{t)J dt +-^a7r'^{t)dB{t), t > 0, (2.1) 

VF”(0) = x, 

where 7r”’(-) = nn^{n-). In what follows we will denote by {Bt}Q<t< 2 T, tke usual augmentation 
of the natural filtration generated by the Brownian motion in (j2.1[) . From now onwards, we 
refer to as the control. We denote by 11 = Iljvfi the collection of all progressively measurable 
processes {7r{t))t>o such that |7r(-)| < Mi, which we refer to as admissible policies, where Mi 
is a positive constant. We take tt"' G II. By the assumptions on e(-) and it follows that 
for every x > 0, the above admits a unique solution. For every n G N, denote by r” the first 
time that reaches a G (0, x), which we will refer to as the ruin level. The investor would 
like to avoid ruin during her lifetime and also to avoid long living close to the ruin level. Also, 
let be the investor’s random time of death. Due to the time scaling, we assume that 

is exponentially distributed with parameter An. The goal of the investor is to minimize the 
following risk sensitive control cost: 


J”(x,7r^) : = - InE 


n 


= — InE 
n 


/ 






dt 


+ - In(An), 
n 
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where p > 0 stands for the punishment cost for ruining and I : [a, oo) —)• [0, A) is a non¬ 
increasing function. The function I represents a punishment for the investor when her wealth 
is close to the ruin level a. Obviously, we would like to give higher punishment when the 
wealth is closer to a. Moreover, since we would like that, given n, the function would be 
decreasing with respect to (w.r.t.) the wealth, we require that l{-) < A. Otherwise, J” would 
be increasing around a. This case represents a situation when the punishment of living close to 
the ruin level dominates the punishment from being ruined. Notice also that the last term on 
the right-hand side (r.h.s.) of the above goes to zero as n goes to infinity. For this reason we 
will ignore it in the analysis in Section HI We summarize the assumptions mentioned above: 

Assumption 2.1 Set the following eonstants p > r > t), a, Mq, a, A > 0. Also, let e : [a, oo) — 
[—oo, Mq] be Lipsehitz and I : [0, oo) — [0, A) be a Lipschitz and non-inereasing funetion. 

The assumption is at force throughout the paper. 

We study the problem when n —?> oo. As mentioned in the introduction, both the prelimit 
stochastic model and the limit suffer from several complexities in the analysis. First, the 
indicator part of the cost function complicates the analysis because it depends on the history 
of the process and if we look at it as a terminal cost, then it is not continuous w.r.t. the 
terminal wealth. Second, we study a discounted version of the risk sensitive cost. To the best 
of our knowledge such formulation studied before only in [2] and also in a queueing system 
framework and with a discount that is free of n. Third, the diffusion coefficient is not necessarily 
bounded away from zero and it depends on the control. In fact as is shown in Section 12.31 the 
asymptotically optimal policy may become zero and therefore, so does the volatility coefficient. 
Therefore, the Hamiltonian method of m Chapter XL 7], or change of measure method of |14] 
do not work here. We find an asymptotically optimal policy for the problem by studying a 
differential game. We show that as n —)• oo the optimal risk sensitive cost function converges 
to the value of the game, and that an asymptotically optimal policy can be deduced from the 
minimizer’s optimal control in the game. 


2.2 Differential game setting 

In this section, inspired by m Chapter XI.7] and [121 Theorem 5.6.7] we describe a differential 
game associated with the optimal risk sensitive control problem. We denote by H = H^i the 
set of all Lipschitz functions vr : [a,oo] —)• [—Mi,Mi]. Given vr S H and tp G AlCo[0, oo), the 
state proeess associated with the initial condition x and the data ip and vr is given by 

(p{t) =-e{(p{t)) + {p-r)TT{(p{t)) + a7r{(p{t))ijj{t), t>0, ( 2 . 2 ) 

<^(0) = X. 


One can easily verify that the state process is well-defined, see m Theorem 19.12]. Note the 
analogy between the above and m- The game payoff is 


sup 

rs[o,oo) 


[-\ + li(p{t))]dt -I{T AT,'ljj) + pl[r<T}], 
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where r is the first time that the state process hits the ruin level a and for every t > 0, I[(t, •) 
is a function mapping C[0, t] to M+ U {+ 00 } defined as 


:= < 


'tp^{s)ds if'0 G ^Co[0, t], 


+00 


otherwise. 


The function I is the rate functioii0 of the Brownian motion as re —>■ 00 , see |12l 

Theorem 5.2.3]. The “sup7’g[o,oo)” differential game analogue of the control problem’s 

discount factor, Are. The payoff is maximized over ijj and minimized over vr. By the definition 
of the function I we may restrict the maximizer only to ip £ ACqIO, 00 ). 

The control vr G IT is taken to be a feedback control and ip G AlCo[0, 00 ) and T G are 
open-loop controls. We call ip the path part of the control and the T a termination time part 
of the control. Given x G [a, 00 ), vr G n, ■(/) G AlCo[0, 00 ), and T G M+, we define the cost until 
time T by 


rTAr 

C{x,Tr, Ip, T) := / [-X + l{(p{t)) --ip^{t)]dt + 

Jo ^ 

The value of the game is defined by 


(2.3) 


U{x):= inf sup C{x,7r,ip,T). 

.|/,e^Co[o,oo),TeK+ 


(2.4) 


In the remark below we show that the maximizer can be restricted to a smaller set of controls 
without any loss. This property serves us in the sequel. 


Remark 2.1 (1) Since l{-) < X it follows that for every ip G AlCo[0, 00 ) and every T G M_|_ one 
has A -|- l{'p{t)) — \iip'^{t)]dt < 0. Therefore, without any loss for the maximizer she can 

he restricted to ip’s under which r < 00 and T G {0,t}. 


(2) Notice moreover that the maximizer can also be restricted to ip’s for which the state process 
satisfies ip{t) < (/?(0) =: x for every t > 0 and by (1) above also r < 00 . Indeed, since the 
integrand on the r.h.s. of (12.3p is negative then the only way that U is positive is in case that 
T < 00 . Let Ip = ipTT be such that r < 00 . Denote by Tx the last time before time r that 
tp{t) = x. Then, 

C{X, TT, iP,t) = [-X + l{(p{t)) - ^ip‘^{t)]dt + p 

< [ + -^ip‘^{t)]dt +p 

Tx 

= [->^ + KTx{t))-^{ipxf{t)]dt + p 

= C{x,Tr,1px,T - Tx), 

^Although we are not using explicitly large deviation arguments in the paper, for intuition reasons we still 
choose to define the cost by using the rate function instead of simply using only ^ f^ •tp^{s)ds. 
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where ijjxi-) ■= '<P{tx + -) and ^Px{-) '■= ^{tx + -)- The last equation follows sinee t — Tx is the first 
time that the state process ipx hits a. That is, fix generates a greater payoff for the maximizer 
and the associated state process does not cross x upwards. Therefore, for every x € [a, b] 

?7(a;) = max < 0, inf sup C{x,7r,fi,r)>, (2.5) 

[ J 

where from now onwards Ax,it is the restriction to absolutely continuous fi’s that satisfy the 
conditions mentioned in (2) above. 


2.3 Main results 


We now present the main theorem, which states that the limit of the value functions of the 
stochastic model converge to the value function of the game. Moreover, we state an asymptot¬ 
ically optimal policy for the stochastic model. 

For every n G N, set the stochastic control 7r*(t) = 7r*’”'(t) = Tr*{W^{t)), t > 0, where vr* 
is the function 


7r*(x) = < 


{fj, — r)e(x) 




0 , 


a < X < d, 


d < x, 


( 2 . 6 ) 


where 


and El 


d := b A inf < y > a : p — 


A-/(u) + l(^)^ 

e(u) 


du = 0 


(2.7) 


b := infix > a : e{x) < 0}. 


By the definition of b and since e(-) < Mq it follows that tt* G TImi for some suitable Mi. 

We will show that the control vr* is an optimal control for the minimizer in the differential 
game and the value function, U, is given by 


U{x) = < 


e{u) 


0 , 


du, a < X < d, 
d < X. 


( 2 . 8 ) 


Since U(x) = 0 for every x > d, the parameter d is referred as the “safe level”. Notice that the 
punishment cost p affects vr* and U through the parameter d, which increases as a function of 
p. Therefore, if p is higher, the safe level is greater. Clearly, it also affects U directly linearly 
on [a, d). 

The next theorem connects between the game and the stochastic model. 

Theorem 2.1 (Main Result) Let := inf^rgn For every x > a one has, 

lim^^oo [/""(x) = U{x), and moreover, lim^^oo^*) = U{x). 

The proof is given in Section 01 

^We use the convention that inf0 = oo. Also, hereafter, in case that b — oo then by the notation {x,b] and 
[x,b] mean {x,oo) and [x, oo) respectively. 
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3 Solution and analysis of the game 


In this section we provide a solution of the game. We start by some basic properties of the 
value function in Section EH In Section 13.21 we present the HJB equation and a verification 
lemma. Then we derive the explicit expressions for U and the optimal control. Finally, in 
Section [3.31 we provide a simple control for the maximizer, which assures her the payoff U{x). 


3.1 Basic properties 

We begin by providing some basic properties that the value function satisfies. These properties 
are used in the verification lemma below. 


Lemma 3.1 The function U, defined in satisfies the following conditions: 

i. 0 < U{x) < p, X G [a,oo) and U{a) = p. 
a. For every x >b one has U{x) = 0 

Hi. U is non-increasing. 

Due to part ii of the previous, in the sequel, we analyze U only on the interval [a,b). 

Proof of Lemma 13.11 i. By choosing T = 0, the maximizer can guarantee U > 0. On the 
other hand, since /(•) — A < 0 then clearly [/(•) < p. By choosing T = 0, one easily gets that 
U (a) = p. 

ii. We show that when x >b, taking tt = 0 we can avoid ruin. Indeed, if vr = 0, then (p = —e{gf), 
(p(0) = X > b. In case that x = b then (p(-) = b. In case that x > b then since the function e(-) 
is Lipschitz we get by Picard-Lindelof theorem that there is a unique p € C^[0, oo) that solves 
the ordinary differential equation that is mentioned above. One can easily verify that once p 
reaches the level b it remains at this level from this time onwards. Therefore, p > b. 


hi. Fix x € (a,oo) and y > x and set (/?(0) = y. Let Tx := inf{t > 0 : p{t) = x}. Using the 
dynamic programming principle along with the fact that /(•) < A, we obtain that 


U{y) = inf sup 

TTSn 


1 


TATx. 


[-A + l{p{t)) - -'ip\t)]dt + U (x) 


< U(x) 


□ 


3.2 The HJB equation 

In this section we prove that equation (12.81) holds. We start with a verification lemma in 
which we provide the HJB (or rather the Isaacs) equation for the problem. Then we present 
a solution for this equation. Recall that by Lemma l3.1l ii. U{x) = 0 for x > b. Therefore, we 
limit ourselves to the interval [0,6). 

Lemma 3.2 (Verification Lemma) Let V : [a, 6) —[0, p], with V{a) = p, be a non¬ 
increasing, continuous function that is differentiable on {a,l3), where fi := 6 A inf{x > a : 
V{x) = 0}. Assume that the following conditions hold: 






(i) For every x G [a,/3) one has 


[The FIJB equation] 


inf sup < V'{x){—e{x) + [fa — r)p + ap6) — A + l{x) 

PSR 0g]R [ 



= 0 ; 

(3.1) 


(a) Let P{x) = Then for every xf G ACq and every x G [o,/3), there exists a unique 

solution to (|2.2I) when we replace vr with P. 

(Hi) Let Q{p,x) = apV'{x). Then for every tt G H, and a < x < (3, there exists a unique 
solution to 


ip{t) = -e{ip{t)) + {p- r)7r{ip{t)) + aTT{ip{t))G{7r{ip{t)),ip{t)), t G [0, To], (3.2) 
(/j(0) = x 



Then U = V on [a, 6). Moreover, the function P is an optimal feedback control. 

Notice that we defined the HJB equation only on the interval [a,/3). This structure follows 
since for every x for which U(x) = 0, under optimality of both players, the time part of the 
maximizer’s control equals zero and the game is terminated immediately. 

Proof of Lemma 13.21 1. As a first step we will prove that for every x G [o, /3) one has 
V{x) > U{x). As a result, if /3 < 6 then 0 = V{P) > U{/3) > 0, where the last inequality 
follows by the first assertion of Lemma 13.11 Since 1^ > 0 and Lf is non-increasing we get that 
V >U on [a,b). 

Fix x G [a,/3). Set the control it* = P. Also, fix a control xf G Ax^-n* and denote by p* the 
state process associated with vr* and xf. Recall that by the definition of Ax,it*, for every t > 0, 
one has p*{t) < (/?*(0) = x and that r* < oo, where r* is the first time that p* reaches a. 
Recalling moreover that x < (5 we get that for every t >0, p*{t) < (3. Since V is differentiable 
on [a, /3) we can apply the chain rule to V and get 



Using again the inequality p* (t) < (3 we get by conditions (i) and (ii) that 


0 = sup I V'{ip*{t))[-e{(p*{t)) + {p- r)TT* {p*(t)) + aM{p*{t))e] - A + l{(p*{t)) - ]-e'A 
6»eK I ^ J 

> V'{ip*{t))[-e{p*{t)) + {p- r)7T*{p*{t)) + aTT*{ip*{t))xp{t)] -X + lip*{t)) - ^xp'^it). 


So we have 



(3.3) 
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By taking first sup^g_ 4 ^ ^ and then inf^gn on both sides, we get that 

y(x)>inf sup C'(x,7r,V’,r*). 

By the above and recalling that 1/ > 0 we get by (12.5p that V{x) > U{x). 

2. By using the assumption that V is non-increasing and that f7 > 0 it follows that it is 
sufficient to prove that V < U on x G [a,/?)• Fix x G [a,/3). For every vr G 11 denote 
0(7r(ip*(t)), (fl* (t)) by where (p* solves (|3.2I) . By (iii) (p* reaches a in a finite time, that 

is, Ta < oo. Using conditions (i) and (ii) we get that 

0 = mf |u'((/?*(t))[-e(</?*(t)) -h {p.-r)p + ape{p,ip*{t))] - X + l{ip*{t)) - 

(3.4) 

< V'{(p*{t))[-e{p>*{t)) + {p,- r)7r{p*{t)) + a7r{ip*(t)] -X + 

Recalling that 

ip*(t) = -e{p*{t)) + {p- r)7r((/?*(t)) -h aTT{Lp*{t), 

we get that 

V{x) < [-X + l{ip*{t)) - + p. 

By taking sup^g_ 4 ^ ^ first and then inf^gn on both sides, we get that 

V{x) < inf sup C{x,Tr,'ijj,T). 

By the definition of /? and since x < /3 it follows that V(x) > 0. Therefore, 

inf sup C(x, vr, V', t) > 0 

and by (|2.5p it follows that U{x) = infTren sup^g_ 4 ^ ^ C{x,'k,'iI),t). Therefore, V{x) < U{x). 
Now, the optimality of the feedback control P follows from (13.31) . □ 


We now use the verification lemma in order to provide an explicit expression for the value 
function U. 


Proposition 3.1 Let 


V{x) = < 


x-iiu) + ^^{L^r 

e{u) 


[ 0 , 


du, a < x < d, 
d < x, 


(3.5) 


where d is defined in (1^ . Then V = U, defined in (1^ . Moreover, the control tt* given in 
(iTfil) is optimal. 
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Proof: Recall that by the second assertion of Lemma 13.11 for every x > h, U (x) = 0. We 
now prove that (j3.5p holds for x € [a,b) using Lemma 13.21 Notice that the parameter /3 that 
appears in the verification lemma is actually d for our particular function V. First, it is easy 
to check that V satishes ()3.ip . Next, since P : [a,(i] —>■ M given by 

P(x) = = (/i - r)e(x) 

^ ^ aW'{x) a2(l(L^)2 + A-/(x)) 

is (locally) Lipschitz, requirement ii. of the verihcation lemma also holds. 

Now we will verify the third condition in the verification lemma. Notice, that (13.2p admits 
a unique solution on the time interval [0, Ta A r^], where is the first time that hits d. The 
equation in (13.21) admits a unique solution on [0, Tq ArJ since (a) on this time interval ip G [a, d] 
and the functions e, tt, and V' are bounded on the interval [a, d] and (b) e is Lipschitz, and vr^ 
and V are locally Lipschitz. 

Next, we will show that the above argument can be upgraded to the interval [0, Ta\- Observe 
that the inequality in (|3.4p holds with Tp{-) := Q{T:{ip{-)),ip{-)) and ip{t) replacing and 

ip*{t). Moreover, since V is non-increasing, we get that V'{(p{t)) < 0 and together with (13.4p 
in this case, actually V'{ip{t)) < 0 on t G [0,ra A rj. Hence, 

ip{t) = -e{<p{t)) + {iM- r)TT{ip{t)) + (j-K{ip{t))i}{t) < -—< 0- 

Thus, ip does not cross (/j( 0) upwards and therefore, = oo, which implies (|3.2I) . 

As a final step we will show that ip hits a in a finite time as a result of which we will obtain 
that we get that ■0 G Ax,Tr- Assume to the contrary that Ta = oo, then by using again ()3.4p in 
our case, we conclude that for every s > 0 one has 

^(P(0)) - R(<y?(s)) < [-X + l{ip{t)) - 

Since l{-) < A we get that the r.h.s. of the above goes to —oo when s —>■ oo, which contradicts 
the fact that V is bounded. 

The optimality of tt* follows by the verification lemma and since U = V. 

□ 


3.3 Saddle point property 

Here, we provide a control (for the maximizer) that is independent of vr and that assures her 
the payoff U{x). The simplicity of this control will be crucial in Section 14.11 Set 

^*(i) = t > 0, (3.6) 

a 

and let T* = r in case x < d and T* = 0 otherwise. Notice that t/j* is independent of the 
control TT. Moreover, notice that under ip* the state process satisfies ip = —e(<^) and therefore 
ip and T* are also independent of the choice of tt. 
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Proposition 3.2 For every x G [a, oo) one has U{x) = inf C{x,'K,'ip*,T*). Moreover, 

ttGII 


r* < 


P-U{x) 

X-l{a) + \{t^r 


(3.7) 


Proof: For x > d one has U{x) = 0 and by definition T* = 0, so that inf C{x,7r,'ip*,T*) = 0. 

TT 

Set X < d. 

Under 'll;*, the state process is independent of vr. Hence, for every vr G H 


C{x,7r,r,T*) = 


f 

px 

J a 

= U{x), 


A-/(u) + i(^)2 


dt + p 


e{u) 


2 ^ ' du + p 


where the second equality follows by the change of variables u = and the last equality 
follows by Proposition 13.11 Hence, the first part of the theorem is proved. 

Since I is non-increasing we get by (12.51) that 


0 < 


rT* 

U{X) = / 

Jo 


—X -I- — 


1 / p — r 


2 V cr 


< 


_ rji^ 


^ 7 / \ I f U — r 

-X + l{a) - - 


X-l{a) + 


2\ a 


1 ( p — r 


2 V cj 


dt + p 
dt + p 

+ P 


and ()3.7I) follows. 


□ 


4 Proof of Theorem 12.1 

The proof of Theorem 12.11 follows by some measure changing arguments and also influenced 
by Varadhan’s lemma. We show in two separate theorems that U{x) is a lower (resp., up¬ 
per) bound to liminf„^oo (resp., limsup^^go 17”(x)), where U^{-) := infTrgn 

Moreover, we show that the policy vr* is asymptotically optimal. 


4.1 Lower bound 

Theorem 4.1 For every x > a one has liminf^^oo U"'{x) > U{x). 


Proof: Recall that U{x) = 0 for every x > d. Since I > 0 and also p > 0 it follows from 
Jensen’s inequality that for any sequence of policies {7r^}„, we have 


-InE 

n 


/ n _ \ 


> 0 . 
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Fix X G (a, d) and fix an arbitrary sequence of policies C II. We show that for every 

e > 0 there is > 0 such that for every n > N one has J’^(x, 7 r"') > U{x) — rco(e), where 
wo(e) —0 as e ^ 0 . 

We start with some preliminaries. Let ijj* be the function from (j3.6jl and let 

r 0<t<T*, 

^*{t) = (4.1) 

[ -e(a), T* < t, 

with = X, where T* is the first time that (p* hits a, which is finite thanks to (13.2p . Note 

that up to time T*, p* is the state process of the differential game associated with if:* and any 
control TT G n. 

Let us hx El > 0. Since I is Lipschitz then there exists 71 > 0 such that for every 
y,ze [a, 00 ) 

\y-z\<'^i implies \l{y) - l{z)\ < £ 1 . (4.2) 

Moreover, since p* is continuous and for t > T*, p*{t) < 0 it follows that one may choose 71 
such that 

\p - p*\T*+ 2 ei < 7i implies \ra[p\ -T*\< ei, (4.3) 

where Ta{p\ := inf{t > 0 : p{t) = a}. Indeed, recall that (/?(0) = x G {a,d). Now, since 
e(-) is positive on [a,d) we get from (|4.1I) that the state process p* is strictly decreasing on 
[0,T* + 2ei], touching a only at T* and continuing to decrease on + 2ei]. 

Define the probability measure Q* = Q*’” on (D, )by 

(t) = e-^^/o f ^\Q j'* _L_ 2s,] 

d^ \ j > i > ij- 

Then under Q*, B*{t) = := B{t) + t G [0,T* + 2ei] is a standard Brownian 

motion and 

dW^{t) = -eiW'^it)) + -^a7r^{W^{t))dB*{t), t G [0,r* +2ei]. 


Now, since that |7r"'(t)| < Mi then by Gronwall’s inequality and Doob’s martingale inequality 
we get that there is a constant Ci > 0 that depends on the Lipschitz constant of e(-) such that 


Q* {{Sn") < 


CiMl 

9 ’ 

nil 


(4.4) 


where 


W^{;U:)-P*{;U) 


T*+2£i 


< 7l 


T” := |a; 

Set N = A^(ei, 7 i, Ml, Cl) such that 

life) C.Aff C.A7{r-+2^.)(A + H7^)') 


N > max 


1 9 ? 

ei £ill 


ei7? 


(4.5) 
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We are now ready to bound from below J”(x,7r"’). Fix n> N then 


In E 


n 




dt 


> — In E 
n 


rT*+2ei 




dt 


It* +£1 


> — In E 
n 

= - In E*^* 
n 


£lg-An(T*+2£i)g 




eie 


-An(r‘+2£i) 




{T* + 2ei) 


> E®* 
-ei 

> E‘3* 

-2ei 

> E®* 


/■r"A(T*+£i) 1 pT*+2ei 

-\{T* + 2ei) + / /(W"(s))ds + 

do ^ Jo 


( /■r,"A(T*+£i) _ 

f—A(r* + 2ei) + J /(VF"(s))ds + — 


r-T*+2ei 


{'lp*f{s)ds 1 l£n 


/ T*-£l 1 '+^£1 _ \ 

[s)) — £\]ds + p —-^ J {'ip*)'^{s)ds j l£n 


I rT*+2£i 


rT*-ei 


2./0 
T*+2£i 


-2ei 


[ 1 Pi '+^61 \ 

= l_A(r* + 2ei) + [i(^^*(s))-ei]ds + p--{rf{s)ds\Q*{£'^) 


-2ei 


r 

= -XT*+ l{ip*{s))d. 

Jo 

= U{x) + Wo{£i), 


{'Ip*f{s)ds + p + Wo{£i) 

^ Jo 


where 

/ pT* 1 pT*+ 2ei \ 

wo{ei)={-2Xei- I l{ip*{s))ds - eiiT* - £ 1 ) - - {rfis)ds\Q*{£^)-2£i 


/T*-£i 
pT 


- -XT* 


+ Jo l{ip* {s))ds - ^ J^ {'tp*f{s)ds + pj{l-Q*{£^)). 


The first three relations are easy to check. The forth relation follows by Jensen’s inequality 
and by (14.411 . The fifth relation follows since l{-) > 0 and by (14.4p and (14.51) . The sixth relation 
follows by (14.21) and (|4.3D . The seventh relation follows since all the terms inside the expectation 
besides the indicator are deterministic. Finally, the last relation follows by Proposition 13.21 
By (j4.4p and (14.51) and recalling that —XT* + l{ip*{s))ds — | ('i/i*)^(s)ds + p = U(x) > 0 

we get that tco(£i) —)• 0 as ei —)• 0. 

□ 
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4.2 Asymptotically optimal policy 

In this section we show that the optimal policy in the game, which was defined in (12.61) is 
an asymptotically optimal policy in the stochastic model. We start with a technical lemma 
that serves us in the proof of the preceding theorem. The lemma provides an upper bound 
for the discounted cost by an alternative cost that is defined through a new measure Q^. Set 
T := p/{X — l{a))- Both and T will play important roles during the proof of the theorem. 


Lemma 4.1 For every n G N, there exists a probability measure ~ P on [0,T], for which 


lim sup — In E 
n 


f 


^-\nt 


dt 


(4.6) 


< lim sup E*^ 


sup 
o<t<T \Jo 


rrliAt 


[l{W"'{s)) — A](is + 


- -n{Q^F), 
n 


Inte 


whereniQ^WF) := 
such that for every n > Ni one has 


is the relative entropy ofQ^ w.r.t. P. Also, there is Ni G N 


-n{Q^F) < 2p. 


n 


(4.7) 


Moreover, for every n G N, there exists an adapted process {'4>{t))o<t<T such that Q'^-almost 
surely, ?/)”(•, w) G AColOjT] and /q^(V’"')^(s , to) < oo, and 


-niQ^F) = 

n 


m\s)ds 


(4.8) 


Proof: First, notice that 
E 

< E 

= E 

< E 


Jo 

Jo Jt 


IT 

1 


LJo n{X-l{a)) 


^n{p+T(l{a)-X)) 


/ 


dt -\- 


1 


< TE 


n(A — l{a)) 

<t<T^ L a _ _^ 

pn 


The first inequality follows since I is non-increasing and since by eliminating the indicator in 
the second exponent we only increase the cost. The second inequality follows by the choice of 
T and since —Ant < —Xnijff A t). The other relations are easy to see. 
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Now, for every n G N let Q” be the measure that satisfies 


-In E 
n V 




= E«" 


sup I 

f 

0<t<T \ 

Jo 


r"At 


- -^(Q" 

n 


(4.9) 


The existence of the measure with the above representation is justified by the following 
argument: It follows from Jensen’s inequality that for any measure Q ~ P and 


f = n sup 
0<t<T 



[l{W'^{s)) — A](is + 


one has 


In E 


= In ( E® 



>E« [f]-n{Qm, 


where equality holds for the measure Q that satisfies 

dQ 

Ip “ E [ef] ■ 

By point i of Lemma 13.11 Theorem 14.11 and (14.9p we get that 


(4.10) 


0 < P(x) < liminf J^{x, tt*) < liminf ( —E'^" [/] — —'P((5”||P) ) < p — limsup ( —H(Q”||P) ) , 
n^oo n^co yn n J n^oo \n J 

where the last inequality follows since A > l{-). Hence, (|4.7p holds. 

We now turn to the last part of the lemma. Since the r.h.s. of (|4.10l) conditioned on Tt is 
a positive P-martingale, then it can be expressed as an exponential martingale. That is, there 
is a predictable and square integrable process (u"'(t))o<t<T such that 

(t) = g\TfQU^(s)dB(s)-^f^(u^)^(s)ds t c lo Tl 

^ w > I > J- 

Now, for (^"'-almost every cu define ip^(-,cu) as the Lesbegue integral of «"■(•, w). 

Finally, notice that under Q^, B{t) — ^/nu'^{t), t G [0, T] is a Brownian motion and therefore 
(j4.8p holds. 

□ 


Theorem 4.2 For every x > a one has limsup^^go J"'(x, tt *) < U{x), where tt * is defined 
on (12.op . 

Proof: Fix x > a. Notice that by (14.Op it is sufficient to bound the limsup of its r.h.s. by 
U{x). 
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During the proof we will make use of the state process and the wealth process under the 
control TT*, which was defined via the function tt* , see (j2.6|) . Consider Q” and '0"' from Lemma 
14.11 Under Q” 


dW^{t) = -e{W^{t)) + (/r - r)7r*{W^{t)) + a7r*{W^{tW 


dt + ^a7r*{W^{t))dB^{t), 
^/n 

(4.11) 


t G (0,T] and 1U"(0) = x, where = B{t) — is a standard Brownian motion 

under Q”. Also, set 

r{t) = -e((^"(t)) + (/r - r)7r*(<^-(t)) + an*{t))r {t), t € [0, T], (4.12) 

(^^(0) = X. 


For any <5 > 0, we define 

= {w : |iy — ip ^ ^}) (4-13) 

where ra[/i] := inf{t G [0,T] : h{t) < a} with the convention that inf0 = oo. Let us write 

CTa[W-^]/\t _ ^ 




sup / 
0<t<TJo 


+ E'3' 


sup 
0<t<TJ0 


- A - -{r{s)r]ds+ 


fTa.[W^]At _ 1 \ 

- X --ir{s)?]ds + ^A^(S) 


(4.14) 


rra[W^]At 


sup / 
\0<t<TJo 


1 , ■ 


m^is)) - A - -{r{s)r]ds + 1 ( a „( 5 ))= 


For uj G A„(5), we have ra[lF”'] > Ta+slp)^]. Then there is a constant ci > 0 that depends on 
the Lipschitz constant of /(•) such that on An{5) and for every t G [0,T], 

r-T-a[VV"]At 


rTa\W“\/\t _ 1 

[^(^"(s)) - A - 2(^"(^))^]^S + P'^{TalW^]<t} 




< 


< 


1, ■ 


[^(‘^''(s)) - A - ]ds + + Cl<5 


rTa+s[‘P'^]At 


[/(<y9'^(s)) - A - -{ip^-is)) ]ds + Pl{r,+i[vp-]<d + 


< Ua+s{x) + Cl5. 

Here we denote by Ua+six) the function defined by (j2.4l) with a replaced by a + (5. The last 
inequality follows since the optimal control vr* defined by (j2.6p is, according to its explicit form, 
the optimal control also for the differential game with a replaced by a + 5. Therefore, 


eQ 


fTa[VV"]At 


1 . ■ 


[liW^is)) - A - -(V’”(s)) ]ds + j 1 a„(s) 

< {Ua+s{x) + ci 6 )Q^{An{ 6 )). 


sup / 
,0<t<TJo 
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On the other hand 




rTa[W^]At 


1, ■ 


o<^<r Jo ^ ^ P'^{ra[w^]<t} 1 l(An(<5))" 


< pQ^i{Ani6)n 


Plugging the last two inequalities into (14.1411 we conclude 


eQ 


r rTa[W"]At _ 1 

sup / [/(lP"(s)) - A --(^”(s))2]ds + /9l 

0<t<TJ0 ^ 

< Ua+5{x) + Cl (5 +{p- Ua+s{x) - Ci6)Q'^{{An{6)y). 


{ra[lP"]<t} 


In the following, we shall show that 


lim Q^{{An{5)r) = 0, 

n—>oo 


(4.15) 


from which it follows that 


lim sup 

< Ua+S{x) + Cl5, 


rTa[W'^]At 


sup / 
0 <t<T Jo 


[KW'^is)) - A - -{V{s)f]ds + pl{r4Ty"]<q 


(4.16) 


Notice that by Proposition 13.11 for every x > a, Ua{x) is continuous as a function of a. Letting 
(5 —)• 0 in (|4.16ll we get 


lim sup E^" 


'Ta[W'^]At _ 1 

sup / [^(^^”(s)) - A--(V^"(s))2]ds + pl^^_^[^;^„]<^j 


0 <t<T Jo 
< Ua{x) = U{X), 

which is what we want to prove. 

We now show that (14.151) holds. By (14.111) and (|4.12p . we have for every t G [0, T] 

W^{t) - = f\- e{W^is)) - e{p^is)) + {p - r){7r*{W^{s)) - 7r*((/p"(s))) 

Jo L 

+ a{7r*{W^{s)) - n*{p^{s)))r{s)] ds + Cit), 


where 

e{t) := -^a7r*iW^{t)dB^it), t G [0,r]. 
yjn 


Since e(-) and 7r*(-) are Lipschitz, there is a constant C 2 > 0 such that for every t G [0,T] 
|W-(t) - < C 2 [\l + \r{s)\)\W^{s) - p^{s)\ds + irit 

Jo 
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By Gronwall’s inequality it follows that 




Therefore, 


^ lTa[M^"]AT - ® I? 


Ta[W^]AT- 


For any K > T, consider 


B, 


,XK) = i^uj-. j (1 + |'(/'"(s)|ds) 


< Ky 


(4.17) 


(4.18) 


Clearly, 

{Ay6)r = {{An{6)r n B^K)) U {{Ay6)r n {ByK)Y 
From (|4.13p . (|4.17p . and (|4.18p . it follows that 

{Ay6)r n BniK) C [u; : ■ 

By Doob’s martingale inequality we have 


E«" 


|P|2 . 

'Ta[W^]AT 




e{Ta[WyATf 


Thus, 


p2c2K p 

Q^{{Ay5)Y n B^K)) < [ir'' 


r,[lT-]AT 


< 


< 


C2T 


n 


On the other hand, 


Q^{{Ay6)Y n {Bn{K)Y) < Q^{{ByK)Y) < Q" / \Y"{s)Yds > 


C2Te‘^^^^ 

n6'^ 


(K-Tf 


< 


{K - T)2 


eQ” 


Jo 


ds 


where the second inequality follows from (14.181) . Due to (14.7p and (14.811 . there is A^i > 0 such 
that for every n > Ni, we have 


eQ" 


r\r{s)\‘ 

Jo 


ds 


< 4p. 


Hence, for every n > Ni, we have 


Q^{iAy6)rn{Bn{K)Y)< 


4pT 


{K-Tf 


Fix e > 0. Let K > T he such that 


4pr ^ £ 


{K -Tf - 2' 
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Take N = maxjA^i,iV 2 }. then for every n> N, we have 

Q^{{An{S)r) < e. 


This implies (I4.15p . 


□ 
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